This exercise was created for a workshop held on Feb 1st at the annual conference for the International Society of Disease Surveillance on obtaining and examining Wildfire Smoke data. The GitHub Repository with this code can be found at .

Workflow

To perform this analysis we’ll be combining population data with wildfire smoke data (available daily) in order to approximate where, and how many people on the ground might have been exposed. Finally we’ll consider some population characteristics to identify disparate burdens to vulnerable populations.

Let’s begin by loading some selected libraries that we’ll use in this workshop. There are three included here:

  1. simple features for preforming spatial operations
  2. tidyverse partly out of habit, but ever useful for wrangling and visualizing data
  3. data table for speeding up the process of querying and manipulating large data sets
# install.packages(c("sf","tidyverse","date.table"))
library(sf)
library(tidyverse)
library(data.table)

Population Data

To know where people live and how many live there, we’ll need the population data for our state. For this we’ll use the US Census state population centers - these are population-weight centers to each tract in the state. For California we use the state FIPS code ‘06’, you can change the code at the end of the URL below to use your state, and we’ll define it explicitly so we can call on it later. We’ll create the URL on the fly using the paste0 function. Note: fread is the default reader from the data.table package and it allows us to read in this text file directly from a url.

stateFIPs <- "06"
TRurl <- paste0("https://www2.census.gov/geo/docs/reference/cenpop2010/tract/CenPop2010_Mean_TR",stateFIPs,".txt")
TRcenters <- fread(TRurl)

Let’s take a look at the data

TRcenters

Note that when reading in the data R automatically translated the FIPS codes to numbers and leading zeros were dropped

class(TRcenters$STATEFP)
[1] "integer"

Alternatively we may wish to read in the geography FIPS as characters and combine them into a single string to more easily match them up with other data later. We’ll do this by defining the class of the columns as character using colClasses within the fread function. Then combining those fields without separators, using paste0. The %>% operator is used with the tidyverse for piping commands, and mutate appends the dataset with new variables that we define - in this case a long identifier for the tract called GEOID.

TRcentersAlt <- fread(TRurl, 
                      colClasses = c("character",
                                     "character",
                                     "character",
                                     "integer",
                                     "double",
                                     "double")) %>%
                mutate(GEOID = paste0(STATEFP, COUNTYFP, TRACTCE))

Let’s see how that looks

TRcentersAlt
class(TRcentersAlt$STATEFP)
[1] "character"

Now we have a table of tract info, but we need it to be a spatial object if we are going to be able to combine it with smoke information. We use the sf package for this. We’ll use the Lat and Lon field already in the data and assign it the Coordinate Reference System (CRS) corresponding to WGS84. Finally, we can plot the points and show population to verify that everything worked.

TRcentersSP <- st_as_sf(TRcenters,
                        coords = c("LONGITUDE", "LATITUDE"),
                        crs = 4326) 

plot(TRcentersSP["POPULATION"])

Wildfire Smoke Data

There are many potential sources for such data including monitor data, modeled data, and satellite (remotely sensed) data. For this exercise we will be looking at data from the National Oceanic and Atmospheric Administration’s Office of Satellite and Product Operations. Specifically, NOAA’s Hazard Mapping System (HMS) Smoke Product.

Technical Documentation of the HMS Products

Here is a recent paper looking at the 2015 wildfire season in California which found * Cardiovascular and Cerebrovascular Emergency Department Visits Associated With Wildfire Smoke Exposure in California in 2015


Let’s first look at the recent (6 month) archive of HMS data available to download. It can be found here https://satepsanone.nesdis.noaa.gov/pub/FIRE/HMS/GIS/ARCHIVE/. You can see that the first date for which smoke data are available is in July 2018. There were several large fires in California this summer and fall and we’ll be able to examine them with this 6 month archive (keep in mind if accessing this exercise after the Feb 1st ISDS Workshop the available dates will be different).

When looking at the archive, you can see that files for each day are in the ESRI shapefile format, where each day’s spatial data consists of 3 files. In this chunk we’ll download the files (.dbf,.shp, .shx) and read them in as a single spatial object. We use the base function substr to separate the full ‘day’ into parts that we may want to use later. Finally, we’ll look at the fields in the data and produce a simple plot of it.

HMSday <- 20180801 # define the day in the format it appears in the  
  
year <- substr(HMSday, 1,4)
month <- substr(HMSday, 5,6)
day <- substr(HMSday, 7,8)

directory <- tempdir() # create a temporary directory to save the files as they're downloaded. 
  
setwd(directory) # set the working directory to the temp directory we just created
shpFiles <- list(paste0("http://satepsanone.nesdis.noaa.gov/pub/FIRE/HMS/GIS/ARCHIVE/hms_smoke",HMSday,".dbf"),
                 paste0("http://satepsanone.nesdis.noaa.gov/pub/FIRE/HMS/GIS/ARCHIVE/hms_smoke",HMSday,".shp"),
                 paste0("http://satepsanone.nesdis.noaa.gov/pub/FIRE/HMS/GIS/ARCHIVE/hms_smoke",HMSday,".shx"))

shpNames <- list("HMSGIS.dbf","HMSGIS.shp","HMSGIS.shx")
  
GISfiles <- mapply(download.file, # the function to execute
                   shpFiles, # a list of the urls to download
                   shpNames, # a list of the names of the destinations for the downloaded files
                   MoreArgs = list(method = "auto", mode = "wb")) # additional arguments for download.file function
  
Smk <- st_read(dsn = "HMSGIS.shp") %>%  st_set_crs(4326) # read in the shapefile and 
Smk

There are 5 fields in the shapefile. Density is the field that contains the information about the concentration of the smoke observed. We’ll use that to take a preliminary look at the data. Before we plot it, we’ll reassign the values the HMS uses for Density to labels that make more sense for us.


Smk <- Smk %>%
  mutate(year = substr(HMSday, 1,4),
        month = substr(HMSday, 5,6),
        day = substr(HMSday, 7,8),
        date = paste0(year, month, day),
        smoke = factor(ifelse(Density %in% c(5, "5.000"), "light", 
                             ifelse(Density %in% c(27, "27.000") ,"heavy", 
                                    ifelse(Density %in% c(16, "16.000"), "medium", "missing"))),
                levels=c("light","medium","heavy","missing"))
         )

 plot(Smk["smoke"], 
      axes = TRUE, # include axes labels in the plot
      key.pos = 1) # place legend at the bottom

The last thing left to do is to combine the population data and the smoke information. This is a spatial operation, where we want to know which smoke plumes are over the population centers that we first brought in. To do this we’ll use the st_intersection function within the sf package. We’ll also use a special function from the lwgeom package to ensure the the smoke polygons are valid and so that we do not encounter unnecessary errors. The out

# install.packages("lwgeom")
library(lwgeom)

SmkTractDayInt <- st_intersection(st_make_valid(Smk),
                                  TRcentersSP) %>%
                        select(date,
                               year,
                               month,
                               day,
                               smoke,
                               STATEFP,
                               COUNTYFP,
                               TRACTCE,
                               POPULATION)
SmkTractDayInt

In examining the result we can see that a single tract from our population centers data can appear in the result multiple times, either for the same smoke level or for different smoke categories. This is because plumes in the HMS data can be stored as separate shapes, depending on the density of the smoke and the time/satellite from which it was derived. Conversely, some tracts do not appear in the result at all. We’ll deal with that in a moment. For now, we’ll make the decision here to consider an intersection with a plume to be an exposure for that tract and day. This will help to simplify the data and give us one observation for each tract-day-smoke combination in our result.

      
st_geometry(SmkTractDayInt) <- NULL # remove the spatial information for the moment and if we want to work with it we can comment this line out
  
singleDay <- SmkTractDayInt  %>%
                mutate(yn = 1) %>% # since every observataion in our result is an instance of smoke with people, we'll give them all a value of 1
                group_by(date,
                         year,
                         month,
                         day,
                         smoke,
                         STATEFP,
                         COUNTYFP,
                         TRACTCE,
                         POPULATION) %>% # then we'll group by these variables 
                summarise(yn = mean(yn)) %>% # and our yes/no marker will be averaged for instances of common smoke categories
                spread(key = smoke,value = yn)  # lastly, let's restructure the data to a wider format, where each smoke category has its own column
                                                # this is not exactly tidy-compliant, but we'll be using this soon
                
unlink(paste0(tempdir(),'/*'))     # clear temp folder
      

Now we have a dataset for a single day and state, where we could quickly compute the number of people under different smoke categories on a single day.


sum(filter(singleDay, heavy == 1)$POPULATION)  # here is that quick calculation using the tidyr notation
[1] 435066
as.data.table(singleDay)[heavy ==1, sum(POPULATION)] # here is that quick calculation using the data.table notation
[1] 435066

Alternatively, we may want to look at which places have the most people exposed, or view all of the tracts under heavy smoke on this day.


filter(singleDay, heavy ==1) %>% arrange(POPULATION)
NA

Wow, a lot of different counties in California had thousands of people exposed on this day.

Recall that when we intersected the tract centroids and the smoke layer we lost tracts with no smoke on that day. It may also be useful to have these, if to only make it easier to see the entire state and population centers when mapping. But we may also want to quantify the percent of the total state or county population exposed. To do this we’d like for the result of our intersection to contain some information for every combination of date-tract-smoke.

singleDayFull <- left_join(TRcenters, singleDay) %>% 
                replace_na(list(date = HMSday, 
                                year = substr(HMSday, 1,4),
                                month = substr(HMSday, 5,6),
                                day = substr(HMSday, 7,8),
                                light = 0, 
                                medium = 0, 
                                heavy = 0))
Joining, by = c("STATEFP", "COUNTYFP", "TRACTCE", "POPULATION")

Now we have a table of the full list of our state’s tracts, the number of people in that tract, and the smoke level they experienced on that day. This is all well and good, but we’d like to make use of computational power to iterate and look at an episode or entire fire season. We already have the commands that we need, we only have to put them into a function that we can apply to several dates or states. The script in this function is just copied (and adapted slightly) from the chunks above.


processHMSday <- function(HMSday){
  
    year <- substr(HMSday, 1,4)
    month <- substr(HMSday, 5,6)
    day <- substr(HMSday, 7,8)
    
    directory <- tempdir() # create a temporary directory to save the files as they're downloaded. 
      
    setwd(directory) # set the working directory to the temp directory we just created
  
    shpFiles <- list(paste0("http://satepsanone.nesdis.noaa.gov/pub/FIRE/HMS/GIS/ARCHIVE/hms_smoke",HMSday,".dbf"),
                     paste0("http://satepsanone.nesdis.noaa.gov/pub/FIRE/HMS/GIS/ARCHIVE/hms_smoke",HMSday,".shp"),
                     paste0("http://satepsanone.nesdis.noaa.gov/pub/FIRE/HMS/GIS/ARCHIVE/hms_smoke",HMSday,".shx"))
  
    shpNames <- list("HMSGIS.dbf","HMSGIS.shp","HMSGIS.shx")
      
    GISfiles <- mapply(download.file, # the function to execute
                       shpFiles, # a list of the urls to download
                       shpNames, # a list of the names of the destinations for the downloaded files
                       MoreArgs = list(method = "auto", mode = "wb")) # additional arguments for download.file function
      
    Smk <- st_read(dsn = "HMSGIS.shp") %>%  
      st_set_crs(4326) %>%
      mutate(year = substr(HMSday, 1,4),
        month = substr(HMSday, 5,6),
        day = substr(HMSday, 7,8),
        date = paste0(year, month, day),
        smoke = factor(ifelse(Density %in% c(5, "5.000"), "light", 
                             ifelse(Density %in% c(27, "27.000") ,"heavy", 
                                    ifelse(Density %in% c(16, "16.000"), "medium", "missing"))),
                levels=c("light","medium","heavy","missing")))
    
    SmkTractDayInt <- st_intersection(st_make_valid(Smk),
                                      TRcentersSP) %>%
                      select(date,
                             year,
                             month,
                             day,
                             smoke,
                             STATEFP,
                             COUNTYFP,
                             TRACTCE,
                             POPULATION) %>%
                    mutate(yn = 1) %>% 
                    group_by(date,
                             year,
                             month,
                             day,
                             smoke,
                             STATEFP,
                             COUNTYFP,
                             TRACTCE,
                             POPULATION) %>% 
                    summarise(yn = mean(yn)) %>% 
                    spread(key = smoke,value = yn) 
    
    st_geometry(SmkTractDayInt) <- NULL
    
    singleFull <- left_join(TRcenters, SmkTractDayInt) %>% 
                        replace_na(list(date = HMSday, 
                                        year = substr(HMSday, 1,4),
                                        month = substr(HMSday, 5,6),
                                        day = substr(HMSday, 7,8),
                                        light = 0, 
                                        medium = 0, 
                                        heavy = 0))
                    
    unlink(paste0(tempdir(),'/*'))
    
    return(singleFull)
    
}

Now we can run the function for a list of dates and combine them to have a larger table of smoke. We can do this for the first week of the Camp Fire which started on Nov 8, 2018.


From the original tract population centers used earlier we have the total (2010) populations in each tract that we can use to produce estimates of the number of people exposed to heavy wildfire smoke. It may also be useful to understand the vulnerable populations that may be particularly impacted in a single event. For this we can dynamically obtain this data on the fly using the American Community Survey API for 2016. You can see the full list of 2016 ACS variables available through the API here https://api.census.gov/data/2016/acs/acs5/variables.html.

Recall that this study of the 2015 California Wildfire Season found significantly elevated risk of stroke and heart attack for those over 65 one day after exposure to heavy wildfire smoke (using HMS). Let’s begin by grabbing this data from ACS. Notice that a new package is needed to read in the data from the ACS’s API in JSON format.

# install.packages("jsonlite")
library(jsonlite)

variable <- "B09020_001E" # this is a placeholde for the ACS varibale we'll read in, t in this case population over 65
# "B17021_016E" # in poverty and living alone


# first read in the data from the ACS and store as a data frome
#examining the data will show that the first row is the names of the data
elderly <-
    as.data.frame(fromJSON(
      paste0("https://api.census.gov/data/2016/acs/acs5?get=NAME,",variable,"&for=tract:*&in=state:",stateFIPs,"%20county:*"
      ), flatten = T
    )) 

# elderlyName <- fromJSON(paste0("https://api.census.gov/data/2016/acs/acs5/variables/",variable,".json"), flatten = T)$label

# here we can elimiate the uneccessary first row and column AND...
# change the name and type of the variables to match our table of smoke exposures
elderly <- elderly[-1,-1] %>% mutate(estimate = as.integer(as.character(V2)), 
                               STATEFP = as.integer(as.character(V3)),
                               COUNTYFP = as.integer(as.character(V4)),
                               TRACTCE = as.integer(as.character(V5))) %>%
                        select(STATEFP, COUNTYFP, TRACTCE, estimate)

elderly
NA

Next we’ll combine the smoke and vulnerable population data we just grabbed from the ACS. And begin to visualize them.

elderlySmoke <- merge(campFire, elderly)

To begin visualizing the data, let’s first produce a timeline of the number of people exposed. We’ll summarize the data by day.

elderlySmoke %>%
  group_by(day) %>%
  summarise(lightPDs = sum(light*POPULATION, na.rm = T),
            mediumPDs = sum(medium*POPULATION, na.rm = T),
            heavyPDs = sum(heavy*POPULATION, na.rm = T)
  ) %>%
  gather(lightPDs, mediumPDs, heavyPDs, key = smoke, value = PersonDays) %>%
  mutate(smoke = factor(x = smoke, levels = c("lightPDs", "mediumPDs", "heavyPDs"))) %>%
       ggplot(aes(
         x = day,
         y = PersonDays/1000000, 
         fill = smoke
       )) +
       geom_bar(stat="identity", position="dodge") + 
       scale_fill_manual(values = c("yellow","orange","red")) +
       ylab("Millions of Persons Exposed") + 
       xlab("Nov day") 

NA
NA

We could also focus on the elderly exposed over the same period

elderlySmoke %>%
  group_by(day) %>%
  summarise(heavyUnder65= sum(heavy*(POPULATION-estimate), na.rm = T),
            heavyOver65 = sum(heavy*estimate, na.rm = T)
  ) %>%
  gather(heavyUnder65, heavyOver65, key = smoke, value = PersonDays) %>%
  mutate(smoke = factor(x = smoke, levels = c("heavyUnder65", "heavyOver65"))) %>%
       ggplot(aes(
         x = day,
         y = PersonDays/1000000, 
         fill = smoke
       )) +
       geom_bar(stat="identity", position="stack") + 
       scale_fill_manual(values = c("pink","red")) +
       ylab("Millions of Persons Exposed") + 
       xlab("Nov day") 

NA
NA

Another really useful way to look at the data will be on a map. For this we’ll summarize the data by location. Also, where we used plot previously this was fine for getting a sense of the information but not very useful for actually interacting with it. Here, we’ll implement a package that uses the java script library Leaflet. This will produce a map that is much more familiar to web users.

# install.packages("leaflet")
library(leaflet)

# first sumarize the data we want by location, aggregating person days of exposure by tract

mapData <- elderlySmoke %>%
  group_by(STATEFP, COUNTYFP, TRACTCE, LATITUDE, LONGITUDE) %>%
  summarise(ExposedPOP = sum(heavy*POPULATION, na.rm = T),
            ExposedPOP65 = sum(heavy*estimate, na.rm = T)
  )

# define the palette for the results showing full population 
palPOP <- colorQuantile(palette = "Blues", 
                        domain = mapData$ExposedPOP, 
                        n = 3)

# define the palette to show the elderly segment of the population 
pal65 <- colorQuantile(palette = "Reds", 
                        domain = mapData$ExposedPOP65, 
                        n = 3)


# here we create the Leaft map with two layers, one for all ages and one for elderly
  mapData %>%
    leaflet() %>%
      addProviderTiles(providers$CartoDB.Positron) %>% # these are the tiles for the background of the map - find more here http://leaflet-extras.github.io/leaflet-providers/preview/index.html 
      addCircleMarkers( # here we add the first layer, since our data has lat and lon, we can add the tabular data direcetly to a map
        radius = 3,
        color = ~ palPOP(ExposedPOP),
        stroke = FALSE,
        fillOpacity = .3,
        label = ~ExposedPOP, # here is text that will appear on mouseover
        popup = ~ExposedPOP, # this is text to appear on a click
        group = "All Ages"
      ) %>%  
      addCircleMarkers( # here we add the second layer, since our data has lat and lon, we can add the tabular data direcetly to a map
        radius =  ~ ExposedPOP65/max(mapData$ExposedPOP65)*10,
        color = ~ pal65(ExposedPOP65),
        stroke = FALSE,
        fillOpacity = .3,
        label = ~paste0(format(ExposedPOP65, big.mark = ","), " over 65 person-days of exposure in the Week the Camp Fire began"),
        popup = ~ExposedPOP65,
        group = "Over 65"
      ) %>%  
    addLayersControl(
                    baseGroups =  c("All Ages","Over 65"),
                    options = layersControlOptions(collapsed = FALSE)
                ) %>%
      setView(lat = 37.085206, lng = -119.540085, zoom = 6) %>% #defaults the view of the original map to show the entire state
      addEasyButton(easyButton(
        icon="fa-globe", title="Zoom to State",
        onClick=JS("function(btn, map){ map.setView([37.085206, -119.540085],6); }"))) # adds a button to return to the whole staet view
Assuming "LONGITUDE" and "LATITUDE" are longitude and latitude, respectively
Assuming "LONGITUDE" and "LATITUDE" are longitude and latitude, respectively

NA
NA

Please take a moment to provide us with feedback on this session.

Also, try some modifications and exercises on your own. Modify the code to * look at your own state * a specific county * other vulnerable segments of the population * other episodes of wildfire smoke * map more than two layers on ‘exposure’

---
title: "Processing HMS Wildfire Smoke Data"
output: html_notebook
---

This exercise was created for a workshop held on Feb 1st at the annual conference for the [International Society of Disease Surveillance]() on obtaining and examining Wildfire Smoke data. The GitHub Repository with this code can be found at []().

#### Workflow

To perform this analysis we'll be combining population data with wildfire smoke data (available daily) in order to approximate where, and how many people on the ground might have been exposed. Finally we'll consider some population characteristics to identify disparate burdens to vulnerable populations. 

Let's begin by loading some selected libraries that we'll use in this workshop. There are three included here: 

1.  [simple features](https://cran.r-project.org/web/packages/sf/index.html) for preforming spatial operations 
    + *[This is a good resource](https://geocompr.robinlovelace.net/) for performing spatial operations in R*
2. [tidyverse](https://www.tidyverse.org/) partly out of habit, but ever useful for wrangling and visualizing data 
    + *I also highly recommend [the accompanying book](https://r4ds.had.co.nz/)*
3. [data table](https://cran.r-project.org/web/packages/data.table/index.html) for speeding up the process of querying and manipulating large data sets
    + *This is a good [series of introductory vignettes](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html) to the package*

```{r,results='hide'}
# install.packages(c("sf","tidyverse","date.table"))
library(sf)
library(tidyverse)
library(data.table)
```

#### Population Data
 
To know where people live and how many live there, we'll need the population data for our state.  For this we'll use the [US Census state population centers](https://www.census.gov/geo/reference/centersofpop.html) - these are population-weight centers to each tract in the state. For California we use the state FIPS code '06', you can change the code at the end of the URL below to use your state, and we'll define it explicitly so we can call on it later. We'll create the URL on the fly using the `paste0` function. Note: `fread` is the default reader from the `data.table` package and it allows us to read in this text file directly from a url.


```{r, results='hide'}
stateFIPs <- "06"
TRurl <- paste0("https://www2.census.gov/geo/docs/reference/cenpop2010/tract/CenPop2010_Mean_TR",stateFIPs,".txt")
TRcenters <- fread(TRurl)
```

Let's take a look at the data 
```{r}
TRcenters
```

Note that when reading in the data R automatically translated the FIPS codes to numbers and leading zeros were dropped

```{r}
class(TRcenters$STATEFP)
```

Alternatively we may wish to read in the geography FIPS as characters and combine them into a single string to more easily match them up with other data later. We'll do this by defining the class of the columns as character using `colClasses` within the `fread` function. Then combining those fields without separators, using `paste0`. The `%>%` operator is used with the tidyverse for piping commands, and `mutate` appends the dataset with new variables that we define - in this case a long identifier for the tract called *GEOID*. 
```{r, results='hide'}
TRcentersAlt <- fread(TRurl, 
                      colClasses = c("character",
                                     "character",
                                     "character",
                                     "integer",
                                     "double",
                                     "double")) %>%
                mutate(GEOID = paste0(STATEFP, COUNTYFP, TRACTCE))

```

Let's see how that looks
```{r}
TRcentersAlt
class(TRcentersAlt$STATEFP)
```
Now we have a table of tract info, but we need it to be a spatial object if we are going to be able to combine it with smoke information. We use the `sf` package for this. We'll use the Lat and Lon field already in the data and assign it the [Coordinate Reference System](https://mgimond.github.io/Spatial/coordinate-systems-in-r.html) (CRS) corresponding to [WGS84](http://spatialreference.org/ref/epsg/wgs-84/). Finally, we can plot the points and show population to verify that everything worked. 


```{r}
TRcentersSP <- st_as_sf(TRcenters,
                        coords = c("LONGITUDE", "LATITUDE"),
                        crs = 4326) 

plot(TRcentersSP["POPULATION"])

```

#### Wildfire Smoke Data
There are many potential sources for such data including monitor data, modeled data, and satellite (remotely sensed) data. For this exercise we will be looking at data from the National Oceanic and Atmospheric Administration's Office of Satellite and Product Operations. Specifically, [NOAA's Hazard Mapping System](https://www.ospo.noaa.gov/Products/land/hms.html) (HMS) Smoke Product. 

* The initial HMS product for the current day is created and updated by a satellite analyst roughly between 8am and 10am Eastern Time. After 10am, the analysis is fine-tuned as time permits as additional satellite data becomes available. Areas of smoke are analyzed and added to the analysis during daylight hours as visible satellite imagery becomes available. The product is finalized and 'completed' for the archive the following morning - generally by around 800am.
* The fire sizes depicted in the product are primarily determined by the field of view of the satellite instrument, or the resolution of the analysis tool. They should not be used to estimate specific fire perimeters.
* The ability to detect fires and smoke can be compromised by many factors, including cloud cover, tree canopy, terrain, the size of the fire or smoke plume, the time of the day, etc. The satellite sensors used to detect fires are sensitive to heat sources and reflected sunlight. Analysts do their best to distinguish between fires and other heat sources or highly reflective surfaces, such as factories, mines, gas flares, solar panels, clouds, etc. but some false detects do get included in the analysis.

##### Technical Documentation of the HMS Products 
* [Use of multiple satellite sensors in NOAA's operational near real-time fire and smoke detection and characterization program](https://www.researchgate.net/profile/Mark_Ruminski/publication/252456636_Use_of_multiple_satellite_sensors_in_NOAA%27s_operational_near_real-time_fire_and_smoke_detection_and_characterization_program/links/5498d67a0cf2c5a7e342c6e8.pdf)
* [Description and Verification of the NOAA Smoke Forecasting System: The 2007 Fire Season](https://journals.ametsoc.org/doi/pdf/10.1175/2008WAF2222165.1)

Here is a recent paper looking at the 2015 wildfire season in California which found 
* [Cardiovascular and Cerebrovascular Emergency Department Visits Associated With Wildfire Smoke Exposure in California in 2015](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6015400/pdf/JAH3-7-e007492.pdf)

------------
Let's first look at the recent (6 month) archive of HMS data available to download. It can be found here [https://satepsanone.nesdis.noaa.gov/pub/FIRE/HMS/GIS/ARCHIVE/](https://satepsanone.nesdis.noaa.gov/pub/FIRE/HMS/GIS/ARCHIVE/). You can see that the first date for which smoke data are available is in July 2018. There were several large fires in California this summer and fall and we'll be able to examine them with this 6 month archive (keep in mind if accessing this exercise after the Feb 1st ISDS Workshop the available dates will be different).

When looking at the archive, you can see that files for each day are in the ESRI shapefile format, where each day's spatial data consists of 3 files. In this chunk we'll download the files (*.dbf*,*.shp*, *.shx*) and read them in as a single spatial object. We use the base function `substr` to separate the full 'day' into parts that we may want to use later. Finally, we'll look at the fields in the data and produce a simple plot of it. 

```{r, results='hide'}
HMSday <- 20180801 # define the day in the format it appears in the  
  
year <- substr(HMSday, 1,4)
month <- substr(HMSday, 5,6)
day <- substr(HMSday, 7,8)

directory <- tempdir() # create a temporary directory to save the files as they're downloaded. 
  
setwd(directory) # set the working directory to the temp directory we just created

shpFiles <- list(paste0("http://satepsanone.nesdis.noaa.gov/pub/FIRE/HMS/GIS/ARCHIVE/hms_smoke",HMSday,".dbf"),
                 paste0("http://satepsanone.nesdis.noaa.gov/pub/FIRE/HMS/GIS/ARCHIVE/hms_smoke",HMSday,".shp"),
                 paste0("http://satepsanone.nesdis.noaa.gov/pub/FIRE/HMS/GIS/ARCHIVE/hms_smoke",HMSday,".shx"))

shpNames <- list("HMSGIS.dbf","HMSGIS.shp","HMSGIS.shx")
  
GISfiles <- mapply(download.file, # the function to execute
                   shpFiles, # a list of the urls to download
                   shpNames, # a list of the names of the destinations for the downloaded files
                   MoreArgs = list(method = "auto", mode = "wb")) # additional arguments for download.file function
  
Smk <- st_read(dsn = "HMSGIS.shp") %>%  st_set_crs(4326) # read in the shapefile and 

Smk
```

There are 5 fields in the shapefile. *Density* is the field that contains the information about the concentration of the smoke observed. We'll use that to take a preliminary look at the data.  Before we plot it, we'll reassign the values the HMS uses for *Density* to labels that make more sense for us. 

```{r}

Smk <- Smk %>%
  mutate(year = substr(HMSday, 1,4),
        month = substr(HMSday, 5,6),
        day = substr(HMSday, 7,8),
        date = paste0(year, month, day),
        smoke = factor(ifelse(Density %in% c(5, "5.000"), "light", 
                             ifelse(Density %in% c(27, "27.000") ,"heavy", 
                                    ifelse(Density %in% c(16, "16.000"), "medium", "missing"))),
                levels=c("light","medium","heavy","missing"))
         )

 plot(Smk["smoke"], 
      axes = TRUE, # include axes labels in the plot
      key.pos = 1) # place legend at the bottom

```
The last thing left to do is to combine the population data and the smoke information. This is a spatial operation, where we want to know which smoke plumes are over  the population centers that we first brought in. To do this we'll use the `st_intersection` function within the `sf` package. We'll also use a special function from the `lwgeom` package to ensure the the smoke polygons are valid and so that we do not encounter unnecessary errors. The out

```{r, results='hide'}
# install.packages("lwgeom")
library(lwgeom)

SmkTractDayInt <- st_intersection(st_make_valid(Smk),
                                  TRcentersSP) %>%
                        select(date,
                               year,
                               month,
                               day,
                               smoke,
                               STATEFP,
                               COUNTYFP,
                               TRACTCE,
                               POPULATION)

SmkTractDayInt
```
In examining the result we can see that a single tract from our population centers data can appear in the result multiple times, either for the same smoke level or for different smoke categories. This is because plumes in the HMS data can be stored as separate shapes, depending on the density of the smoke and the time/satellite from which it was derived. Conversely, some tracts do not appear in the result at all. We'll deal with that in a moment. For now, we'll make the decision here to consider an intersection with a plume to be an exposure for that tract and day. This will help to simplify the data and give us one observation for each tract-day-smoke combination in our result.


```{r}
      
st_geometry(SmkTractDayInt) <- NULL # remove the spatial information for the moment and if we want to work with it we can comment this line out
  
singleDay <- SmkTractDayInt  %>%
                mutate(yn = 1) %>% # since every observataion in our result is an instance of smoke with people, we'll give them all a value of 1
                group_by(date,
                         year,
                         month,
                         day,
                         smoke,
                         STATEFP,
                         COUNTYFP,
                         TRACTCE,
                         POPULATION) %>% # then we'll group by these variables 
                summarise(yn = mean(yn)) %>% # and our yes/no marker will be averaged for instances of common smoke categories
                spread(key = smoke,value = yn)  # lastly, let's restructure the data to a wider format, where each smoke category has its own column
                                                # this is not exactly tidy-compliant, but we'll be using this soon
                
unlink(paste0(tempdir(),'/*'))     # clear temp folder
      
```
Now we have a dataset for a single day and state, where we could quickly compute the number of people under different smoke categories on a single day.  

```{r}

sum(filter(singleDay, heavy == 1)$POPULATION)  # here is that quick calculation using the tidyr notation

as.data.table(singleDay)[heavy ==1, sum(POPULATION)] # here is that quick calculation using the data.table notation

```
Alternatively, we may want to look at which places have the most people exposed, or view all of the tracts under heavy smoke on this day.  

```{r}

filter(singleDay, heavy ==1) %>% arrange(POPULATION)

```
Wow, a lot of different counties in California had thousands of people exposed on this day. 

Recall that when we intersected the tract centroids and the smoke layer we lost tracts with no smoke on that day. It may also be useful to have these, if to only make it easier to see the entire state and population centers when mapping. But we may also want to quantify the percent of the total state or county population exposed. To do this we'd like for the result of our intersection to contain some information for every combination of date-tract-smoke. 

```{r}
singleDayFull <- left_join(TRcenters, singleDay) %>% 
                replace_na(list(date = HMSday, 
                                year = substr(HMSday, 1,4),
                                month = substr(HMSday, 5,6),
                                day = substr(HMSday, 7,8),
                                light = 0, 
                                medium = 0, 
                                heavy = 0))

```

Now we have a table of the full list of our state's tracts, the number of people in that tract, and the smoke level they experienced on that day. This is all well and good, but we'd like to make use of computational power to iterate and look at an episode or entire fire season. We already have the commands that we need, we only have to put them into a function that we can apply to several dates or states. The script in this function is just copied (and adapted slightly) from the chunks above. 

```{r}

processHMSday <- function(HMSday){
  
    year <- substr(HMSday, 1,4)
    month <- substr(HMSday, 5,6)
    day <- substr(HMSday, 7,8)
    
    directory <- tempdir() # create a temporary directory to save the files as they're downloaded. 
      
    setwd(directory) # set the working directory to the temp directory we just created
  
    shpFiles <- list(paste0("http://satepsanone.nesdis.noaa.gov/pub/FIRE/HMS/GIS/ARCHIVE/hms_smoke",HMSday,".dbf"),
                     paste0("http://satepsanone.nesdis.noaa.gov/pub/FIRE/HMS/GIS/ARCHIVE/hms_smoke",HMSday,".shp"),
                     paste0("http://satepsanone.nesdis.noaa.gov/pub/FIRE/HMS/GIS/ARCHIVE/hms_smoke",HMSday,".shx"))
  
    shpNames <- list("HMSGIS.dbf","HMSGIS.shp","HMSGIS.shx")
      
    GISfiles <- mapply(download.file, # the function to execute
                       shpFiles, # a list of the urls to download
                       shpNames, # a list of the names of the destinations for the downloaded files
                       MoreArgs = list(method = "auto", mode = "wb")) # additional arguments for download.file function
      
    Smk <- st_read(dsn = "HMSGIS.shp") %>%  
      st_set_crs(4326) %>%
      mutate(year = substr(HMSday, 1,4),
        month = substr(HMSday, 5,6),
        day = substr(HMSday, 7,8),
        date = paste0(year, month, day),
        smoke = factor(ifelse(Density %in% c(5, "5.000"), "light", 
                             ifelse(Density %in% c(27, "27.000") ,"heavy", 
                                    ifelse(Density %in% c(16, "16.000"), "medium", "missing"))),
                levels=c("light","medium","heavy","missing")))
    
    SmkTractDayInt <- st_intersection(st_make_valid(Smk),
                                      TRcentersSP) %>%
                      select(date,
                             year,
                             month,
                             day,
                             smoke,
                             STATEFP,
                             COUNTYFP,
                             TRACTCE,
                             POPULATION) %>%
                    mutate(yn = 1) %>% 
                    group_by(date,
                             year,
                             month,
                             day,
                             smoke,
                             STATEFP,
                             COUNTYFP,
                             TRACTCE,
                             POPULATION) %>% 
                    summarise(yn = mean(yn)) %>% 
                    spread(key = smoke,value = yn) 
    
    st_geometry(SmkTractDayInt) <- NULL
    
    singleFull <- left_join(TRcenters, SmkTractDayInt) %>% 
                        replace_na(list(date = HMSday, 
                                        year = substr(HMSday, 1,4),
                                        month = substr(HMSday, 5,6),
                                        day = substr(HMSday, 7,8),
                                        light = 0, 
                                        medium = 0, 
                                        heavy = 0))
                    
    unlink(paste0(tempdir(),'/*'))
    
    return(singleFull)
    
}


```

Now we can run the function for a list of dates and combine them to have a larger table of smoke. We can do this for the first week of the Camp Fire which started on Nov 8, 2018. 

```{r, results='hide'}

dateList <- 20181108:20181116

campFire <- rbindlist(lapply(dateList, FUN = processHMSday), fill = TRUE)
```
--------------------

From the original tract population centers used earlier we have the total (2010) populations in each tract that we can use to produce estimates of the number of people exposed to heavy wildfire smoke. It may also be useful to understand the vulnerable populations that may be particularly impacted in a single event. For this we can dynamically obtain this data on the fly using the [American Community Survey API](https://www.census.gov/data/developers/data-sets/acs-5year.2016.html) for 2016. You can see the full list of 2016 ACS variables available through the API here [https://api.census.gov/data/2016/acs/acs5/variables.html](https://api.census.gov/data/2016/acs/acs5/variables.html). 

Recall that [this study of the 2015 California Wildfire Season](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6015400/pdf/JAH3-7-e007492.pdf) found significantly elevated risk of stroke and heart attack for those over 65 one day after exposure to heavy wildfire smoke (using HMS). Let's begin by grabbing this data from ACS.  Notice that a new package is needed to read in the data from the ACS's API in JSON format.

```{r}
# install.packages("jsonlite")
library(jsonlite)

variable <- "B09020_001E" # this is a placeholde for the ACS varibale we'll read in, t in this case population over 65
# "B17021_016E" # in poverty and living alone


# first read in the data from the ACS and store as a data frome
#examining the data will show that the first row is the names of the data
elderly <-
    as.data.frame(fromJSON(
      paste0("https://api.census.gov/data/2016/acs/acs5?get=NAME,",variable,"&for=tract:*&in=state:",stateFIPs,"%20county:*"
      ), flatten = T
    )) 

# elderlyName <- fromJSON(paste0("https://api.census.gov/data/2016/acs/acs5/variables/",variable,".json"), flatten = T)$label

# here we can elimiate the uneccessary first row and column AND...
# change the name and type of the variables to match our table of smoke exposures
elderly <- elderly[-1,-1] %>% mutate(estimate = as.integer(as.character(V2)), 
                               STATEFP = as.integer(as.character(V3)),
                               COUNTYFP = as.integer(as.character(V4)),
                               TRACTCE = as.integer(as.character(V5))) %>%
                        select(STATEFP, COUNTYFP, TRACTCE, estimate)

elderly

```
Next we'll combine the smoke and vulnerable population data we just grabbed from the ACS. And begin to visualize them. 
```{r}
elderlySmoke <- merge(campFire, elderly)

```

To begin visualizing the data, let's first produce a timeline of the number of people exposed. We'll summarize the data by day.

```{r}
elderlySmoke %>%
  group_by(day) %>%
  summarise(lightPDs = sum(light*POPULATION, na.rm = T),
            mediumPDs = sum(medium*POPULATION, na.rm = T),
            heavyPDs = sum(heavy*POPULATION, na.rm = T)
  ) %>%
  gather(lightPDs, mediumPDs, heavyPDs, key = smoke, value = PersonDays) %>%
  mutate(smoke = factor(x = smoke, levels = c("lightPDs", "mediumPDs", "heavyPDs"))) %>%
       ggplot(aes(
         x = day,
         y = PersonDays/1000000, 
         fill = smoke
       )) +
       geom_bar(stat="identity", position="dodge") + 
       scale_fill_manual(values = c("yellow","orange","red")) +
       ylab("Millions of Persons Exposed") + 
       xlab("Nov day") 
    
    
```

We could also focus on the elderly exposed over the same period

```{r}
elderlySmoke %>%
  group_by(day) %>%
  summarise(heavyUnder65= sum(heavy*(POPULATION-estimate), na.rm = T),
            heavyOver65 = sum(heavy*estimate, na.rm = T)
  ) %>%
  gather(heavyUnder65, heavyOver65, key = smoke, value = PersonDays) %>%
  mutate(smoke = factor(x = smoke, levels = c("heavyUnder65", "heavyOver65"))) %>%
       ggplot(aes(
         x = day,
         y = PersonDays/1000000, 
         fill = smoke
       )) +
       geom_bar(stat="identity", position="stack") + 
       scale_fill_manual(values = c("pink","red")) +
       ylab("Millions of Persons Exposed") + 
       xlab("Nov day") 
    
    
```
Another really useful way to look at the data will be on a map. For this we'll summarize the data by location. Also, where we used `plot` previously this was fine for getting a sense of the information but not very useful for actually interacting with it. Here, we'll implement a package that uses the java script library [Leaflet](https://rstudio.github.io/leaflet/). This will produce a map that is much more familiar to web users. 

```{r}
# install.packages("leaflet")
library(leaflet)

# first sumarize the data we want by location, aggregating person days of exposure by tract

mapData <- elderlySmoke %>%
  group_by(STATEFP, COUNTYFP, TRACTCE, LATITUDE, LONGITUDE) %>%
  summarise(ExposedPOP = sum(heavy*POPULATION, na.rm = T),
            ExposedPOP65 = sum(heavy*estimate, na.rm = T)
  )

# define the palette for the results showing full population 
palPOP <- colorQuantile(palette = "Blues", 
                        domain = mapData$ExposedPOP, 
                        n = 3)

# define the palette to show the elderly segment of the population 
pal65 <- colorQuantile(palette = "Reds", 
                        domain = mapData$ExposedPOP65, 
                        n = 3)


# here we create the Leaft map with two layers, one for all ages and one for elderly
  mapData %>%
    leaflet() %>%
      addProviderTiles(providers$CartoDB.Positron) %>% # these are the tiles for the background of the map - find more here http://leaflet-extras.github.io/leaflet-providers/preview/index.html 
      addCircleMarkers( # here we add the first layer, since our data has lat and lon, we can add the tabular data direcetly to a map
        radius = 3,
        color = ~ palPOP(ExposedPOP),
        stroke = FALSE,
        fillOpacity = .3,
        label = ~ExposedPOP, # here is text that will appear on mouseover
        popup = ~ExposedPOP, # this is text to appear on a click
        group = "All Ages"
      ) %>%  
      addCircleMarkers( # here we add the second layer, since our data has lat and lon, we can add the tabular data direcetly to a map
        radius =  ~ ExposedPOP65/max(mapData$ExposedPOP65)*10,
        color = ~ pal65(ExposedPOP65),
        stroke = FALSE,
        fillOpacity = .3,
        label = ~paste0(format(ExposedPOP65, big.mark = ","), " over 65 person-days of exposure in the Week the Camp Fire began"),
        popup = ~ExposedPOP65,
        group = "Over 65"
      ) %>%  
    addLayersControl(
                    baseGroups =  c("All Ages","Over 65"),
                    options = layersControlOptions(collapsed = FALSE)
                ) %>%
      setView(lat = 37.085206, lng = -119.540085, zoom = 6) %>% #defaults the view of the original map to show the entire state
      addEasyButton(easyButton(
        icon="fa-globe", title="Zoom to State",
        onClick=JS("function(btn, map){ map.setView([37.085206, -119.540085],6); }"))) # adds a button to return to the whole staet view
    

```
## Please [take a moment to provide us with feedback](https://goo.gl/forms/hpUApy0gP98O8S5J2) on this session.

Also, try some modifications and exercises on your own. Modify the code to 
* look at your own state
* a specific county
* other vulnerable segments of the population
* other episodes of wildfire smoke
* map more than two layers on 'exposure'
